1 . 2 Trigram Statistical Language Model

نویسندگان

  • Thomas J. Watson
  • Raymond Lau
چکیده

First and foremost, I would like to thank Salim Roukos at IBM’s Thomas J. Watson Research Center and Ronald Rosenfeld at Carnegie Mellon University. The work presented here was conducted while I was a visitor at TJ Watson during the summer and fall of 1993 and it was an extension of my previous work on trigger language models done at TJ Watson the preceding summer. Salim was my supervisor and mentor throughout my stays at TJ Watson. Roni Rosenfeld and I worked together on my earlier trigger language model exploration at IBM. Without Roni, many of the implementation issues related to maximum entropy would have not been resolved as efficiently. I would also like to thank Peter Brown and Robert Mercer presently at Renaissance Technology and Stephen Della Pietra and Vincent Della Pietra presently at TJ Watson. Peter, Bob, Stephen, Vincent and Salim all jointly conceived and developed the maximum entropy framework for building statistical models at TJ Watson. Finally, I cannot forget Victor Zue, my thesis supervisor, who took time out of his busy schedule to offer his valuable input.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying Statistical English Language Modeling to Symbolic Machine Translation

The PANGLOSS Mark III system [Frederking et al. 94] was from the outset designed to be a symbolic, human-aided machine translation (MT) system. The need arose to rapidly adapt it for use as a fully-automated MT system. Our solution to this problem was to add a statistical English language model (ELM) to replace the most significant user activity, selecting between alternate translations produce...

متن کامل

Using Generalized Language Model for Question Matching

Question and answering service is one of the popular services in the World Wide Web. The main goal of these services is to finding the best answer for user's input question as quick as possible. In order to achieve this aim, most of these use new techniques foe question matching. . We have a lot of question and answering services in Persian web, so it seems that developing a question matching m...

متن کامل

A unified approach to statistical language modeling for Chinese

This paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigrams to Chinese is challenging because (1) there is no standard definition of words in Chinese, (2) word boundaries are not marked by spaces, and (3) there is a dearth of training data. Our unified approach automatically and consistently gathers a highquality training data set...

متن کامل

Speech, Hearing and Language: work in progress Volume 14 STUDIES IN THE STATISTICAL MODELLING OF DIALOGUE TURN PAIRS IN THE BRITISH NATIONAL CORPUS

This article describes some preliminary investigations into the statistical properties of the transcribed dialogues that were collected for the British National Corpus of English. Our aim has been to look for evidence of linguistic structure which could be used to build better statistical language models for spontaneous human-human dialogues. We have concentrated on pairs of successive, relativ...

متن کامل

Statistical Analysis of Multilingual Text Corpus and Development of Language Models

This paper presents two studies, first a statistical analysis for three languages i.e. Hindi, Punjabi and Nepali and the other, development of language models for three Indian languages i.e. Indian English, Punjabi and Nepali. The main objective of this study is to find distinction among these languages and development of language models for their identification. Detailed statistical analysis h...

متن کامل

Improved Language Modeling for Statistical Machine Translation

Statistical machine translation systems use a combination of one or more translation models and a language model. While there is a significant body of research addressing the improvement of translation models, the problem of optimizing language models for a specific translation task has not received much attention. Typically, standard word trigram models are used as an out-of-the-box component ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994